square estimation
Fast and Robust Least Squares Estimation in Corrupted Linear Models
Subsampling methods have been recently proposed to speed up least squares estimation in large scale settings. However, these algorithms are typically not robust to outliers or corruptions in the observed covariates. The concept of influence that was developed for regression diagnostics can be used to detect such corrupted observations as shown in this paper. This property of influence -- for which we also develop a randomized approximation -- motivates our proposed subsampling algorithm for large scale corrupted linear regression which limits the influence of data points since highly influential points contribute most to the residual error. Under a general model of corrupted observations, we show theoretically and empirically on a variety of simulated and real datasets that our algorithm improves over the current state-of-the-art approximation schemes for ordinary least squares.
Export Reviews, Discussions, Author Feedback and Meta-Reviews
In this paper, the authors extend the "resource allocation with semi-bandit feedback", proposed by Lattimore et al. [2014], to the multi-resource case. The paper has provided two regret bounds, one for the worst case (Theorem 2) and the other for the "resource-laden" case (Theorem 7). The authors also provide a new result on the "weighted least squares estimation", which is independently interesting. The paper is well-written and very interesting, the analysis in this paper is also rigorous. The extension to the multi-resource case is non-trivial, and the new result on the "weighted least squares estimation" is very interesting and might be reused by researchers in the field of bandit/RL in the future. Thus, I think this paper meets the acceptance threshold.
How To Automate Your Statistical Data Analysis
During my university studies, I attended a course named Statistical Data Analysis. I was excited about this course because it taught me all the basic statistical analysis methods such as (non-)linear regression, ANOVA, MANOVA, LDA, PCA, etc. However, I never learned about the business application of these methods. During the course, we worked with several examples. Still, all the samples were CSV datasets, mainly from Kaggle.
Fast and Robust Least Squares Estimation in Corrupted Linear Models
McWilliams, Brian, Krummenacher, Gabriel, Lucic, Mario, Buhmann, Joachim M.
Subsampling methods have been recently proposed to speed up least squares estimation in large scale settings. However, these algorithms are typically not robust to outliers or corruptions in the observed covariates. The concept of influence that was developed for regression diagnostics can be used to detect such corrupted observations as shown in this paper. This property of influence -- for which we also develop a randomized approximation -- motivates our proposed subsampling algorithm for large scale corrupted linear regression which limits the influence of data points since highly influential points contribute most to the residual error. Under a general model of corrupted observations, we show theoretically and empirically on a variety of simulated and real datasets that our algorithm improves over the current state-of-the-art approximation schemes for ordinary least squares. Papers published at the Neural Information Processing Systems Conference.